The citizens of the world are vast and diverse across the 150+ plus countries on the planet and thus the perceptions of one countries citizens to another can vary greatly. The World Happiness Report aimed to collect and quantify this information to see what people around the world think of their country and the direction it might be going in. This report has not been without controversy, specifically the metrics being measured are debated on being are skewed a particular direction that puts other countries at a disadvantage or misrepresents the citizen's true feelings of their country.
A detailed analysis of the World Happiness Reports from 2015-2019 to see what makes citizens happy with their country and what are the major contributors of that happiness. Along with this, analyze the metrics to see if the criticism about the measured metrics hold true for the happiness reports. This will be done by analyzing their relationship to the overall happiness score (which determines a countries ranking in the report) and plotting the data on geographic maps to bring everything into a single view to see how the data looks from a holistic perspective. This would hopefully expose trends between countries and make it easier to see not only what direction a country might be heading but what they might be lacking for their citizens.
Download Location: https://www.kaggle.com/unsdsn/world-happiness
import ssl
import warnings
import pycountry
import numpy as np
import pandas as pd
import seaborn as sb
import pandas_profiling as pp
from notebook import __version__ as nbv
# Basemap
from mpl_toolkits.basemap import Basemap
from mpl_toolkits.basemap import __version__ as basev
# scipy Libraries
from scipy.stats import norm, stats
from scipy import __version__ as scipv
# matplotlib Libraries
import matplotlib.pyplot as plt
from matplotlib import __version__ as mpv
# plotly Libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly import __version__ as pvm
# Library Versions
lib_info = [('ssl', ssl.OPENSSL_VERSION.split(' ')[1]), ('scipy', scipv), ('numpy', np.__version__),
('pandas', pd.__version__),('plotly', pvm), ('seaborn', sb.__version__),
('pycountry', pycountry.__version__), ('matplotlib', mpv),('pandas_profiling', pp.__version__),
('mpl_toolkits.basemap', basev), ('Jupyter Notebook (notebook)', nbv)]
print('Library Versions\n' + '='*16)
for name, vers in lib_info:
print('{:>27} = {}'.format(name, vers))
rep2015 = pd.read_csv('Report_Data/2015.csv')
rep2016 = pd.read_csv('Report_Data/2016.csv')
rep2017 = pd.read_csv('Report_Data/2017.csv')
rep2018 = pd.read_csv('Report_Data/2018.csv')
rep2019 = pd.read_csv('Report_Data/2019.csv')
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2015.shape[1], rep2015.shape[0]))
rep2015.head()
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2016.shape[1], rep2016.shape[0]))
rep2016.head()
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2017.shape[1], rep2017.shape[0]))
rep2017.head()
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2018.shape[1], rep2018.shape[0]))
rep2018.head()
print("Dataset Dimensions: {:,} columns and {:,} rows".format(rep2019.shape[1], rep2019.shape[0]))
rep2019.head()
From the heads of the various datasets above, we can see that none of them are in the same format, specially their column names. In order to combine all of the datasets correctly they will need to be parsed and remapped accordingly.
Columns starting with Happiness, Whisker and the Dystopia.Residual are the targets, just differently named targets. Dystopia Residual compares each countries scores to the theoretical unhappiest country in the world. Since the data from the different report years have different naming conventions, a common name will need to be abstracted in order to combine them all correctly.
# This function takes the relevant report dataset and
# year in order to parse the data into a usable format
def parse_report(report_df, year):
# Rename columns of reports 2018 and 2019 to match
# that of the earlier reports (2015, 2016, 2017)
if 2017 < year < 2020:
report_df.rename(columns={'Overall rank': 'Happiness Rank', 'Country or region': 'Country',
'Score': 'Happiness Score', 'GDP per capita': 'Economy (GDP per Capita)',
'Social support': 'Family', 'Healthy life expectancy': 'Health (Life Expectancy)',
'Freedom to make life choices': 'Freedom',
'Perceptions of corruption': 'Trust (Government Corruption)'}, inplace=True)
targets = ['Low', 'Low-Mid', 'Top-Mid', 'Top']
df_cols = ['Country', 'Rank', 'GDP', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust']
# Load report data into common columns
target_cols = []
for col in df_cols:
target_cols.extend([new_col for new_col in report_df.columns if col in new_col])
df = pd.DataFrame()
df[df_cols] = report_df[target_cols]
df['Happiness Score'] = report_df[[col for col in report_df.columns if 'Score' in col]]
# Calculate quartiles on the data.
df["Target"] = pd.qcut(df[df.columns[-1]], len(targets), labels=targets)
df["Target_n"] = pd.qcut(df[df.columns[-2]], len(targets), labels=range(len(targets)))
# Insert Year column
df.insert(1, 'Year', pd.Series([year] * len(report_df)))
return df
report_data = parse_report(rep2015, 2015)
for repData, year in [(rep2016, 2016), (rep2017.round(5), 2017), (rep2018, 2018), (rep2019, 2019)]:
report_data = report_data.append(parse_report(repData, year), sort=False)
report_data = report_data.reset_index(drop=True)
fix_names = [('Taiwan Province of China', 'Taiwan'), ('Macedonia', 'North Macedonia'),
('Hong Kong S.A.R., China', 'Hong Kong'), ('Trinidad & Tobago', 'Trinidad and Tobago')]
for wrong_name, right_name in fix_names:
report_data.loc[report_data.Country == wrong_name, 'Country'] = right_name
# Rename "Happiness Score" column to "Happiness_Score",
# "Health" column to "Life_Expectancy" and "Trust" to "Gov_Trustworthiness"
report_data.rename(columns={'Happiness Score': 'Happiness_Score', 'Health': 'Life_Expectancy',
'Trust': 'Gov_Trustworthiness'}, inplace=True)
print("Combined Dataset Dimensions: {:,} columns and {:,} rows".format(report_data.shape[1], report_data.shape[0]))
report_data.head()
print('Missing Value Counts for Each Column\n' + '='*36)
print(report_data.isnull().sum())
print('\n\nRow(s) in dataset with missing data:')
report_data[report_data['Gov_Trustworthiness'].isna()]
We can see that the row with the missing data came from the 2018 report. Because there is only one row with missing data and the extent of the analysis does not hinge on the missing data, the row will not be removed and left as is.
print("Describe Dataset:")
report_data.describe()
fig = plt.figure()
fig.subplots_adjust(hspace=0.8, wspace=0.5)
fig.set_size_inches(13.5, 13)
sb.set(font_scale = 1.25)
warnings.filterwarnings('ignore')
i = 1
for var in report_data.columns:
try:
fig.add_subplot(4, 2, i)
sb.distplot(pd.Series(report_data[var], name=''), bins=50,
fit=norm, kde=False).set_title(var + " Histogram")
plt.ylabel('Count')
i += 1
except ValueError:
pass
fig.tight_layout()
warnings.filterwarnings('default')
# Combined Happiness Reports Profiling Report
pp.ProfileReport(report_data).to_notebook_iframe()
plt.rcParams['figure.figsize'] = (15, 10)
plt.rcParams.update({'font.size': 13})
sb.set(font_scale = 1.5)
sb.set_style(style='white')
sb.heatmap(report_data.corr(), annot=True, linewidth=1).set_title('Annotated Correlation Matrix of Combined Dataset')
It looks like GDP
, Family
, and Life Expectancy
are strongly correlated with the Happiness score. While Freedom
correlates very well with the Happiness score, it's also correlated quite well with all data columns (except Rank
). Gov_Trustworthiness
still has a moderately good correlation with the Happiness score.
Below is a pairwise comparison of our variables to give us a birds eye view of the distributions and correlations of the dataset. The color is based on quartiles of the Happiness_Score
so (0%-25%, 25%-50%, 50%-75%, 75%-100%)
.
Note: right-click the graph and select "Open Image in New Tab" to zoom in to get a better view.
fig = plt.figure()
fig.set_size_inches(12, 12)
sb.set(font_scale = 1.25)
sb.pairplot(report_data.drop(['Target_n'], axis=1),
hue='Target').fig.suptitle("Birds Eye of View of Column Distributions and Correlations", y=1.01)
In the scatterplots, we see that GDP
, Family
, and Life_Expectancy
are quite linearly correlated with some noise. It is to see interesting that the correlation of Gov_Trustworthiness
has distributions all over the place, with no straightforward pattern evident.
Based on the preprocessing and analysis above, i can see that the data has (essentially) no missing or duplicated values and there are some strong correlations between several variables in the dataset. With EDA finished, we will move onto a deeper and more detailed analysis of the data.
In this section we will take a deeper look into the various relationships (highs and lows) between the data columns using interactive plots and data coordination (how the data points connect to each other.
Before we dive deeper into the dataset, lets take a look at the highs and lows for each of the metrics to get a better idea of our range of values.
report_data.sort_values(by='GDP', ascending=False).head()
report_data.sort_values(by='GDP', ascending=True).head()
report_data.sort_values(by='Family', ascending=False).head()
report_data.sort_values(by='Family', ascending=True).head()
report_data.sort_values(by='Life_Expectancy', ascending=False).head()
report_data.sort_values(by='Life_Expectancy', ascending=True).head()
report_data.sort_values(by='Freedom', ascending=False).head()
report_data.sort_values(by='Freedom', ascending=True).head()
report_data.sort_values(by='Generosity', ascending=False).head()
report_data.sort_values(by='Generosity', ascending=True).head()
report_data.sort_values(by='Gov_Trustworthiness', ascending=False).head()
report_data.sort_values(by='Gov_Trustworthiness', ascending=True).head()
report_data.sort_values(by='Happiness_Score', ascending=False).head()
report_data.sort_values(by='Happiness_Score', ascending=True).head()
def plotlyScatterPlot(df, col1, col2, xaxis_range):
slider = [dict(currentvalue={"prefix": "Year: "})]
fig = px.scatter(df.sort_values('Year'), x=col1, y=col2,
title=col2 + " vs. " + col1,
animation_frame="Year", animation_group="Country",
color="Target", hover_name="Country",
hover_data=["Year", "Rank", "GDP", "Family", "Life_Expectancy", "Gov_Trustworthiness"],
width=980, height=800).update_layout(sliders=slider, xaxis_range=xaxis_range, yaxis_range=[2, 8])
fig.show()
One of the biggest criticisms of the World Happiness Report is the almost linear correlation between a country's GDP
and Happiness_Score
. Meaning countries with a higher GPD
will inherently have a higher Happiness_Score
, (when in reality that might not be the case) and at the same time making lower GDP
countries out to be more unhappier than they might actually be.
plotlyScatterPlot(report_data, 'GDP', 'Happiness_Score', [-0.05, 2.2])
plotlyScatterPlot(report_data, 'Family', 'Happiness_Score', [-0.05, 1.8])
plotlyScatterPlot(report_data, 'Life_Expectancy', 'Happiness_Score', [-0.05, 1.2])
coord_data = go.Parcoords(line = dict(color = report_data['Target_n'], colorscale = 'Temps'),
dimensions=list([
dict(range=[report_data['Year'].min(),
report_data['Year'].max()],
tickvals = report_data['Year'].unique(),
label='Year', values=report_data['Year']),
dict(range=[0, report_data['Target_n'].max()],
tickvals = report_data['Target_n'].unique(),
ticktext = report_data['Target'].unique(),
label='Targets', values=report_data['Target_n']),
dict(range=[(report_data['Rank'] * -1).min(),
(report_data['Rank'] * -1).max()],
label='Rank', values=(report_data['Rank'] * -1)),
dict(range=[report_data['GDP'].min(),
report_data['GDP'].max()],
label='GDP', values=report_data['GDP']),
dict(range=[report_data['Family'].min(),
report_data['Family'].max()],
label='Family', values=report_data['Family']),
dict(range=[report_data['Life_Expectancy'].min(),
report_data['Life_Expectancy'].max()],
label='Life_Expectancy', values=report_data['Life_Expectancy']),
dict(range=[report_data['Freedom'].min(),
report_data['Freedom'].max()],
label='Freedom', values=report_data['Freedom']),
dict(range=[report_data['Generosity'].min(),
report_data['Generosity'].max()],
label='Generosity', values=report_data['Generosity']),
dict(range=[report_data['Gov_Trustworthiness'].min(),
report_data['Gov_Trustworthiness'].max()],
label='Gov_Trust', values=report_data['Gov_Trustworthiness']),
dict(range=[report_data['Happiness_Score'].min(),
report_data['Happiness_Score'].max()],
label='Happy_Score', values=report_data['Happiness_Score'])
]))
layout = go.Layout(
title = '''Interactive Parallel Coordinate Plot
<br><sup>(Click and Drag Vertically Along the Axes to Apply Filters)</sup>''',
title_y=0.98, height=850, font=dict(size=15, color='black')
)
go.Figure(data=coord_data, layout=layout)
From the interactive plots, we can see that overall countries seem to be heading towards the right (higher/better scores, which is good because it would not be a good look for the world if countries as whole were getting worse. There were some outliers here and there depending on the metric but so far it seems to hold true that the higher the three highly correlated metrics identified in Part 1, Step 8 are (GDP
, Family
, and Life_Expectancy
), the happier the country is.
This section will focus on plotting on geo-maps to bring all the data into perspective in the world view. I will be using Basemap from mpl_toolkits.basemap
and Choropleth Maps from plotly.express
to do the map plotting.
To plot maps with Plotly I'll need to use the 3 letter country codes (ISO_Alpha 3) and to do that I'll be scrapping the "Current codes" table from the ISO_3166-1 Wikipedia page using pandas.read_html().
# This ssl line is needed to allow for pandas to load in the table
# from wikipedia, otherwise an SSL "Invalid Certifcate" error occures
# I'm unsure if this will happen on other systems but I was unable to fix it on mine
ssl._create_default_https_context = ssl._create_unverified_context
# Load in Wikipedia data table
world_codes = pd.read_html('https://en.wikipedia.org/wiki/ISO_3166-1')[1].rename(
columns={'English short name (using title case)': 'World_Country',
'Alpha-2 code': 'ISO_a2', 'Alpha-3 code': 'ISO_a3'})
# If for whatever reason pandas is unable to read the data correctly from the wikipedia page above,
# I have included the data in a csv file to be read from instead: "Wikipedia_ISO_3166-1.csv"
# Uncomment the line below and comment the lines above to read from the csv file instead of the website
# world_codes = pd.read_csv('Report_Data/Wikipedia_ISO_3166-1.csv') # (Oct 9th, 2021)
# Get 3 letter country codes from pycountry
countries = {}
for country in pycountry.countries:
countries[country.alpha_3] = country.name
world_codes = world_codes[world_codes.columns[:-3]]
world_codes['Country'] = [countries.get(country, 'Unknown Code') for country in list(world_codes['ISO_a3'])]
# Parse country names to make sure that they match the names in our dataset
# As you can see there a few that needed to be mapped manually
for country in world_codes['Country']:
if "Unknown Code" in country:
world_codes.loc[world_codes.Country == country,
'Country'] = world_codes.loc[world_codes.Country == country, 'World_Country']
elif "Côte d'Ivoire" == country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Ivory Coast"
elif "Eswatini" == country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Swaziland"
elif "Viet Nam" == country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Vietnam"
elif "Congo" == country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Congo (Brazzaville)"
elif "Congo," in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Congo (Kinshasa)"
elif "Korea" in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "South Korea"
elif "Czech" in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Czech Republic"
elif "Russia" in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Russia"
elif "Somali" in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Somalia"
elif "Macedonia" in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "North Macedonia"
elif "Lao" in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Laos"
elif "Palestin" in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Palestinian Territories"
elif "Syria" in country:
world_codes.loc[world_codes.Country == country, 'Country'] = "Syria"
else:
if ',' in country:
country_part = country.split(',')[0]
if country_part in country:
world_codes.loc[world_codes.Country == country, 'Country'] = country_part
print("Dataset Dimensions: {:,} columns and {:,} rows".format(world_codes.shape[1], world_codes.shape[0]))
world_codes.head()
To visualize the maps using Basemap, I need coordinates (latitude and longitude) for the countries, in this case I'll be using country capitals. This data can be retrieved from this site: http://techslides.com/list-of-countries-and-capitals but i am specifically scraping the webpage (using pandas.read_html()) for the data table because of improper data formatting in the linked downloadable data sources.
map_coords = pd.read_html('http://techslides.com/list-of-countries-and-capitals')[0]
# Apply Headers to dataframe from first row of table
new_header = map_coords.iloc[0]
map_coords = map_coords[1:]
map_coords.columns = [head.replace(' ', '_') for head in new_header]
map_coords = map_coords.apply(pd.to_numeric, errors='ignore')
# If for whatever reason pandas is unable to read the data correctly from the website above,
# I have included the data in a csv file to be read from instead: "country-capital_coordinates.csv"
# Uncomment the line below and comment the lines above to read from the csv file instead of the website
# map_coords = pd.read_csv('Report_Data/country-capital_coordinates.csv') # (Oct 9th, 2021)
# Some manual country parsing to match the dataset
for country in map_coords['Country_Name']:
if "Cote d’Ivoire" == country:
map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Ivory Coast"
elif "Palestin" in country:
map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Palestinian Territories"
elif "Macedonia" in country:
map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "North Macedonia"
elif "Gambia" in country:
map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Gambia"
elif "Republic of Congo" == country:
map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Congo (Brazzaville)"
elif "Democratic Republic of the Congo" in country:
map_coords.loc[map_coords.Country_Name == country, 'Country_Name'] = "Congo (Kinshasa)"
print("Dataset Dimensions: {:,} columns and {:,} rows".format(map_coords.shape[1], map_coords.shape[0]))
map_coords.head()
report_data_codes = report_data.merge(world_codes.drop('World_Country', axis=1), on='Country')
print("Dataset Dimensions: {:,} columns and {:,} rows".format(report_data_codes.shape[1], report_data_codes.shape[0]))
report_data_codes.head()
report_data_coords = pd.merge(report_data_codes,
map_coords[['Country_Name', 'Capital_Name', 'Capital_Latitude', 'Capital_Longitude']],
left_on='Country', right_on='Country_Name'
).drop('Country_Name', axis=1).sort_values(by=['Country', 'Year'], ascending=True
).reset_index(drop=True)
print("Dataset Dimensions: {:,} columns and {:,} rows".format(report_data_coords.shape[1], report_data_coords.shape[0]))
report_data_coords.head()
The below output is a list of countries that do not have a valid country code and thus were not merged correctly into the dataset.
for country in report_data['Country'].unique():
if country not in list(report_data_coords['Country'].unique()):
print(country)
def worldBasemap(df, col1, col2):
sb.set(style=("white"), font_scale=1.5)
m = Basemap(projection='mill', llcrnrlat=-60, urcrnrlat=90,
llcrnrlon=-180, urcrnrlon=180, resolution='c')
m.drawcountries()
m.drawparallels(np.arange(-90, 91., 30.))
m.drawmeridians(np.arange(-90, 90., 60.))
lat = df['Capital_Latitude'].values
long = df['Capital_Longitude'].values
col_color = df[col1].values
col_size = df[col2].values
m.scatter(long, lat, latlon=True, c=col_color, s=150*col_size,
linewidth=1, edgecolors='black', cmap='hot', alpha=1)
m.fillcontinents(color='#072B57', lake_color='#FFFFFF', alpha=0.4)
plt.title("World - " + col1 + " vs. " + col2, fontsize=25)
m.colorbar(label=col1)
plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'GDP')
plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'Family')
plt.figure(figsize=(16, 10))
worldBasemap(report_data_coords, 'Happiness_Score', 'Life_Expectancy')
The world graphs above make it clear that Much of Europe and the Americas are doing the best in terms of the metrics of this report. The graphs would lead you to believe the all of Africa, and much of Asia has a lot more room for development.
Europe is kind of hard to see whats going on, so lets zoom in a little.
def europeBasemap(df, col1, col2):
sb.set(style=("white"), font_scale=1.5)
m = Basemap(projection='mill', llcrnrlat=30, urcrnrlat=72,
llcrnrlon=-20, urcrnrlon=55, resolution='l')
m.drawstates()
m.drawcountries()
m.drawparallels(np.arange(-90, 91., 30.))
m.drawmeridians(np.arange(-90, 90., 60.))
lat = df['Capital_Latitude'].values
lon = df['Capital_Longitude'].values
col_color = df[col1].values
col_size = df[col2].values
m.scatter(lon, lat, latlon=True, c=col_color, s=250*col_size,
linewidth=2, edgecolors='black', cmap='hot', alpha=1)
m.fillcontinents(color='#072B57', lake_color='#FFFFFF', alpha=0.3)
plt.title('Europe - ' + col1 + ' vs. ' + col2, fontsize=25)
m.colorbar(label=col1)
plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'GDP')
plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'Family')
plt.figure(figsize=(16, 16))
europeBasemap(report_data_coords, 'Happiness_Score', 'Life_Expectancy')
From the Europe maps above, we can see that much of northern and central Europe is fairing the best in terms of the metrics, while much of southern Europe is lagging behind.
NOTE: If you are viewing this notebook in nbviewer, the plotly geo-maps will not be rendered because the connections to do so get blocked by the site and I am unable to find a workaround. As a result, if you want to view this notebook in its entirety, you will need to use Binder or a fully functional Jupyter environment instead
The huge benefit of using plotly is that the maps can be animated and/or have filters applied to view the data a bit more dynamically. It makes it much easier to view data on a timescale.
def plotlyMap(df, col, scope, height):
slider = [dict(currentvalue={"prefix": "Year: "})]
fig = px.choropleth(df.sort_values('Year'), locations="ISO_a3", scope=scope.lower(),
color=col, animation_frame="Year", animation_group="Country",
hover_name="Country", hover_data=["Year", "Rank", "Family", "Life_Expectancy",
"Gov_Trustworthiness"],
color_continuous_scale=px.colors.sequential.haline).update_layout(
autosize=False, height=height, width=980, sliders=slider,
title_text = 'Interactive ' + scope.capitalize() + ' Map - ' + col)
fig.show()
plotlyMap(report_data_coords, 'Happiness_Score', 'world', 600)
plotlyMap(report_data_coords, 'GDP', 'world', 600)
Notable to point out that there was quite the downturn in world GDP in 2018, which appears to be related to a number of economic factors around the world, article: Economic growth is slowing all around the world.
plotlyMap(report_data_coords, 'Family', 'world', 600)
plotlyMap(report_data_coords, 'Life_Expectancy', 'world', 600)
It is very interesting to me to see how the world changes from year to year with this data, being able to quickly look at each year of the data and compare them is very beneficial when doing this kind of analysis.
Just like in Step 5: Europe Maps (Basemap) section, lets zoom in on Europe.
plotlyMap(report_data_coords, 'Happiness_Score', 'europe', 750)
plotlyMap(report_data_coords, 'GDP', 'europe', 750)
As we saw on the Plotly - World GDP map, there was quite a downturn in GDP in 2018. Article: Economic growth is slowing all around the world.
plotlyMap(report_data_coords, 'Family', 'europe', 750)
plotlyMap(report_data_coords, 'Life_Expectancy', 'europe', 750)
From the Europe maps above, we can see that much of northern and central Europe is fairing a bit better in terms of the metrics, while eastern and southern Europe are lagging a bit behind.
From the analysis in notebook, it seems like some of the criticism for "The World Happiness Report" ring true, there is a high focus on a country's GDP
along with strongly correlated features such as Family
and Life_Expectancy
.
It does make sense to an extent that not only having money but also having a good social net (Family
) is important and does make it easier for people to advance in life in whatever direction they so choose. This also translates quite well to Life_Expectancy
because of a greater ability to provide for yourself (and your Family), thus having access to better options in general.
Suffice to say, money can indeed buy happiness.